Genetics Selection Evolution — Latest Matching Preprints

1

PEC: a robust algorithm to reconcile pedigree and SNP-chip data on the basis of LD block, haplotype information, and Mendelian conflicts

Fu, C.; Mei, Q.; Miao, Y.; Xiang, T.

2026-06-04 genetics 10.64898/2026.06.01.729286 medRxiv

Top 0.1%

61.2%

Show abstract

MotivationPedigree errors frequently occur in livestock populations due to long-term manual record-keeping, which reduces the efficiency of breeding programs. Although several pedigree correction methods exist, their practical application is often limited by complicated procedures, high computational cost, and insufficient accuracy. Therefore, an effective and efficient solution for pedigree error correction is needed. ResultsWe developed a new algorithm and software, PEC, to accurately and efficiently correct pedigree errors. The method matches haplotype fragments between candidate parents and offspring using estimated linkage disequilibrium patterns and subsequently checks for Mendelian conflicts to adjust the pedigree. Using simulated pig datasets, we compared PEC against SeekParentF90 and AlphaAssign in terms of accuracy, memory usage, and computation time. PEC demonstrated superior performance across all metrics. Furthermore, application of single-step genomic best linear unbiased prediction (ssGBLUP) in a real pig population showed that PEC corrected pedigrees significantly improved the accuracy and unbiasedness of genomic evaluations, highlighting the importance of pedigree error correction. AvailabilityThe PEC software is freely available at https://github.com/TXiang-lab/JPEC.

2

Genetic Modeling of Dyadic Behavioral Traits: Implications for Estimation and Interpretation of Variance Components

Jiang, X.; Siegford, J.; Steibel, J. P.

2026-06-12 genetics 10.64898/2026.06.10.731434 medRxiv

Top 0.1%

60.1%

Show abstract

Studying the genomic control of dyadic social interactions is gaining traction in animal genetics. However, genetic modeling of social interactions poses several challenges, one of which is whether social interactions should be treated as dyadic traits or as aggregated traits at the individual level. In this study, we systematically compared two approaches: dyadic models using dyadic traits and marginal models using marginally aggregated traits and we derived the algebraic relationships between their variance components. In the application, we used a published dataset on post-mixing aggression in pigs, including both directed and undirected aggression records collected during the 9-hour period after mixing among 797 finishing pigs in 59 social groups, as an example to show how model choice can affect variance estimation. Results showed that dyadic models can estimate genetic effects and permanent environmental effects by exploiting repeated dyadic interaction records, thereby enabling a more complete understanding of the sources of variation underlying social interactions. In contrast, marginal models can bias the estimation and interpretation of genetic components, as the aggregated genetic variance may be confounded with other variance components due to the aggregation of dyadic traits. Marginal models may also lead to overestimation of social group and residual variance. These results can provide useful guidance for choosing appropriate modeling strategies for social interaction traits.

3

An endogenous retrovirus insertion disrupting bovine ALKBH8 causes a failure-to-thrive syndrome with immunodeficiency associated with juvenile mortality in Brown Swiss cattle

Glatthard, S.; Kadri, N. K.; Seefried, F. R.; Voitl, L. R.; Weber, B. A.; Schwarzenbacher, H.; Meister, S. L.; Gurtner, C.; OGrady, J. F.; Osbahr, M.; Leonard, A. S.; Meylan, M.; Pausch, H.; Droegemueller, C.; Jacinto, J.

2026-07-10 genomics 10.64898/2026.07.09.737535 medRxiv

Top 0.1%

11.8%

Show abstract

The Brown Swiss (BS) cattle breed is one of the major Swiss dairy breeds. Intensive selection and the widespread use of few elite sires in artificial insemination have increased inbreeding and the occurrence of deleterious recessive alleles in the homozygous state. Analyzing life trajectories in large, genotyped cohorts can identify hidden recessive disorders that are difficult to detect using traditional case-control association testing. Long-read DNA sequencing enables precise detection of causal alleles, including structural variants. This study aimed to (1) identify cryptic recessive loci affecting rearing performance in Swiss BS cattle, (2) evaluate their impact on survival, (3) characterize the associated phenotype, (4) identify the causal variant using long-read whole-genome sequencing, and (5) assess its functional impact. Using Homozygous Haplotype Enrichment/Depletion (HHED) mapping, we identified a risk haplotype (BH39) on chromosome 15 spanning from 16,276,819 bp to 16,446,984 bp that was associated with increased juvenile mortality within the first 180 days of life when present in the homozygous state. The BH39 occurred at a frequency of approximately 4.5% in Swiss BS cattle and 5.3% in German and Austrian BS cattle, and homozygous carriers exhibited a significantly reduced first-year survival rate. Five females homozygous for BH39 underwent clinical examination. They all showed recurrent respiratory disease, impaired growth, poor body condition, rough hair coat, and brown-discolored teeth. Pathological examination revealed bronchopneumonia and eosinophilic enteritis. Clinicopathological findings indicated failure to thrive and immunodeficiency. Long-read WGS of two BH39 homozygous calves revealed a private homozygous coding variant that was in high linkage disequilibrium with BH39. The identified structural variant was an insertion of a large transposable element (10.4 kb ERVK[2-1-LTR]) into the third exon of ALKBH8 (NM_001080341.2 c.267_268indel). Full-length RNA sequencing of cerebellum and liver from a homozygous calf revealed that the endogenous retrovirus (ERV) insertion introduces a cryptic transcription termination signal, truncating ALKBH8 mRNA. This study demonstrates that exploring population-scale genomic data and mining thousands of life-history records, followed by veterinary follow-up evaluations and molecular genetic analyses, provides an effective strategy for identifying cryptic recessive disorders that shorten the lifespan of cattle. The findings provide strong evidence that the ERV insertion into the coding sequence of ALKBH8 represents a loss-of-function variant that causes a previously undescribed recessive disorder that results in increased rearing loss. Interpretive summaryWe identified a recessive disorder in Brown Swiss cattle that causes retarded growth, recurrent infections, immunodeficiency, and increased mortality during the first year of life. Using population-scale genomic data, clinical investigations, and long-read sequencing, we linked the disorder to an exonic transposable element insertion disrupting ALKBH8. The identification of the causal variant now enables direct genetic testing and the implementation of genome-based mating strategies to avoid carrier-by-carrier matings and, consequently, prevent the birth of affected homozygous offspring. We demonstrate the utility of integrating large-scale breeding records, veterinary phenotyping, and advanced genomics to identify hidden defects affecting livestock health and productivity.

4

Genomic insights into bacterial kidney disease resistance in Arctic charr (Salvelinus alpinus) via a 72k SNP array

Palaiokostas, C.; Jeuthe, H.; Nilsson, K. N.; Hallbom, H.; Axen, C.; Evensen, O.; Eriksson, S.; Johnsson, M.

2026-06-27 genetics 10.64898/2026.06.25.734482 medRxiv

Top 0.1%

8.2%

Show abstract

Selection for disease resistance forms one of the most highlighted areas of aquaculture breeding. A breeding program for Arctic charr has been operating in Sweden for over 40 years, making it the oldest of its kind worldwide for this species. However, the lack of available genomic resources prevented selection for any disease-resistance traits. A 72k Axiom SNP array was produced in this study and used to assess the potential to select for charr resistant to bacterial kidney disease (BKD), which is currently a major threat to the industry. Following a challenge experiment with Renibacterium salmoninarum, the causative agent of BKD, relevant phenotypic proxies were collected from approximately 2,000 charr. Thereafter, those animals were genotyped with the new 72k SNP array. The magnitude of the estimated variance components suggested potential for breeding for BKD resistance in charr, with relevant heritabilities ranging from 0.05 to 0.56 depending on the resistance proxy used. In addition, GWAS suggested that BKD resistance is a polygenic trait. Furthermore, genomic prediction approaches indicated that BKD-resistant animals can be identified using their SNP genotypes. Accuracies, expressed as Pearson correlation coefficients, when BKD resistance was analysed as a continuous trait, ranged from 0.42 to 0.52. In the scenario where BKD resistance was treated as a binary trait, the efficiency of genomic prediction was assessed using ROC curves, with an area under the curve of 0.72. Finally, no unfavourable correlations were found with growth traits. The developed 72k SNP array has the potential of being a pivotal tool for the Swedish Arctic charr breeding program. Moreover, our data support the use of genomic prediction in breeding BKD-resistant Arctic charr. As a critical next step, further validations in actual industry conditions would be required.

5

The genetic architecture of milk urea concentration in dairy cattle differs across the lactation cycle

He, Q.; Vasiljevic, S.; Kadri, N.; Watson, N.; Stratz, P.; Mapel, X. m.; Leonard, A. S.; seefried, F. R.; Pausch, H.

2026-04-24 genomics 10.64898/2026.04.22.719978 medRxiv

Top 0.1%

6.9%

Show abstract

Milk urea concentration (MUC) is an indicator of dietary protein utilization and nitrogen use efficiency in dairy cows. We performed genome-wide association studies (GWAS) on MUC in early, mid, and late lactation in the Holstein (HOL) and Brown Swiss (BSW) dairy cattle breeds using imputed sequence variants. We identified 11 and 17 independent quantitative trait loci (QTL) for MUC across the three lactation stages in BSW and HOL, respectively. While many of these QTL have previously been reported for MUC and other dairy traits, our study provides evidence that some QTL exert lactation-stage specific effects. Our findings suggest that variants at the DGAT1 locus on BTA14 have pleiotropic effects on MUC and other dairy traits. This QTL showed an early lactation-specific association with MUC but impacted milk and fat yield across the entire lactation. We fine-mapped two QTL for MUC in early and mid-lactation in BSW on BTA9 (lead SNP: 9:21392941, Pcorrected = 1.1E-17) and BTA28 (lead SNP: 28:6518357; Pcorrected = 3E-11). We identified lncRNA ENSBTAG00000058688 and IBTK as positional and functional candidate genes for the BTA9 QTL, and KCNK1 as positional and functional candidate gene that harbors a highly significant missense variant for the BTA28 QTL. In conclusion, our results shed light on the genetic architecture of MUC and highlighted QTL harboring potential functional variants underpinning milk urea variation within and across breeds.

6

Genomic Inbreeding and Selection Signatures analyses in the Doberman Pinscher breed

Mulim-McCarthy, H.; Fragomeni, B.; Liu, S.; Rojas de Oliveira, H.

2026-06-08 genomics 10.64898/2026.06.04.730131 medRxiv

Top 0.1%

5.7%

Show abstract

The Doberman Pinscher population has undergone strong artificial selection for morphology and behavior, which can reduce genomic diversity and increase autozygosity. Here, we characterized the genome structure and identified selection signatures in Doberman Pinschers using complementary within- and between-population approaches. Genotypes from 3,226 Dobermans Dogs (Illumina CanineHD; 216,184 SNPs) provided by the Doberman Diversity Project were analyzed after purpose-specific quality control. Genomic inbreeding was quantified using four allele-frequency-based metrics and the runs of homozygosity (FROH) approach. Selection signatures were detected using intrapopulation (i.e., Runs of Homozygosity--ROH; Integrated Haplotype Score--iHS; and Number of Segregating Sites by Length--nSL) and interpopulation methods (i.e., Fixation Index--FST; Cross-Population Extended Haplotype Homozygosity--XP-EHH; and Cross-Population Number of Segregating Sites by Length--XP-nSL) comparing the Doberman Pinscher breed to Labrador Retriever (n=237). Dobermans showed high overall inbreeding, with a mean FROH of 0.42 (range 0.22-0.68), whereas the allele-frequency-based inbreeding estimators had similar means ([~]0.04). The partitioning of the ROH indicated high contributions from medium-to-long ROHs, consistent with recent inbreeding. The ROH scans identified 39,512 SNPs in ROH islands ([≥]50% frequency across individuals), with notable concentrations on CFA2, CFA3, and CFA31. Haplotype-based scans identified 2,820 candidate iHS SNPs and 2,173 candidate nSL SNPs (|score|>2). A common set of 310 SNPs was shared among ROH, iHS, and nSL, mapping near 279 genes that were mostly enriched for developmental pathways, particularly neurodevelopment and neuron-related cellular components. Between breeds, 349 highly differentiated SNPs were detected by FST, while XP-EHH and XP-nSL highlighted over 1,000 of Doberman-specific haplotype signals. A total of seven SNPs overlapped across FST, XP-EHH, and XP-nSL, which were located mainly on CFA8 ([~]59.48-60.61 Mb) near the KCNK10, SPATA7, PTPN21, NEGR1, and BTG1 genes. These genes are mainly linked to neural development and signaling, but BTG1 has also been associated with cardiomyocyte cell-cycle regulation, and KCNK10 with cardiac excitability and remodeling. Overall, the Doberman Pinscher breed exhibits high genome-wide autozygosity and levels of inbreeding. In addition, our results showed consistent, multi-method evidence of selection at loci associated with neurodevelopmental and regulatory pathways. These findings provide prioritized targets for follow-up studies that integrate phenotypes relevant to breed health and performance.

7

Haplotype assembly without parental sequencing: Genotype-based trio-binning (GT-Trio)

Hettasch, T. J.; Gjuvsland, A. B.; Kent, M. P.; Grove, H.; Vage, D. I.

2026-06-11 genomics 10.64898/2026.06.08.729486 medRxiv

Top 0.1%

5.4%

Show abstract

Trio-binning is a robust method for haplotype-resolved assembly, providing the most accurate representation of diploid genomes including complex and haplotype-specific variation. Conventional trio-binning methods depend on parental short-read sequences to differentiate offspring reads originating from the maternal and paternal haplotypes. Here, we present a genotype-based trio-binning pipeline (GT-Trio) which reconstructs parent sequences from phased parental genotypes and uses this as an alternative source of parental information for haplotype assembly. The GT-Trio pipeline was applied to assemble the maternal and paternal haplotypes of three Norwegian Red (NR) cattle individuals, using phased parental genotypes imputed from array to sequence as input. Haplotypes assembled with GT-Trio using all sequence variants as parental input demonstrated assembly quality and phasing accuracy comparable to that achieved with conventional trio-binning. Using lower density subsets of array SNPs led to a slight reduction in accuracy of haplotype separation, accompanied by an increase in size, contiguity and completeness, suggesting a trade-off between assembly quality and phasing accuracy associated with the density of parental genotypes provided as input to the pipeline. Overall, GT-Trio provides a scalable framework for haplotype assembly without parental sequencing and will be applicable in livestock species where genotyping and imputation is performed routinely. The GT-Trio pipeline is available at https://github.com/theahettasch/GT-Trio.

8

Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv

Top 0.1%

3.2%

Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

9

Analysis of genetic variation in the bovine Mannose Receptor gene (MRC1), its influence on receptor expression, and a potential association with resistance to bovine tuberculosis

Holder, A.; Kolakowski, J. F.; Usher, E.; Tzelos, T.; Connelley, T. k.; Shabbir, M. Z.; Gibson, A. J.; Harris, H.; Villarreal-Ramos, B.; Werling, D.

2026-07-03 immunology 10.64898/2026.06.27.734952 medRxiv

Top 0.1%

3.2%

Show abstract

Naturally occurring variation in the bovine mannose receptor C-type 1 gene (MRC1) may shape macrophage responses to Mycobacterium (M.) bovis, a key driver of bovine tuberculosis (bTB). We identified four coding region SNPs in MRC1 across Bos taurus (Holstein Friesian, Brown Swiss) and Bos indicus (Boran, Sahiwal) cattle breeds, including a non-synonymous variant, rs380943118 (c.2963G>A; Ser988Asn) in C-type lectin-like domain (CTLD) 6, most prevalent in Sahiwal cattle. Structural modelling suggested that the S988N substitution, which is spatially separated from the monosaccharide binding site of CTLD4, might indirectly affect glycan binding, perhaps through a conformational change in the receptor. Monocyte-derived macrophages upregulated MR expression during differentiation, with heterozygous (G/A) animals showing higher MR expression and increased uptake of GFP-M. bovis BCG, although differences were not statistically significant. Anti-CD206 blockade did not inhibit BCG internalization, either indicating that this specific antibody did not bind to a CTLD involved in ligand binding or that MR is not the sole entry receptor. These results highlight naturally occurring MRC1 polymorphisms that may influence MR structure and macrophage function, providing a foundation for future studies to assess their role in bTB susceptibility.

10

Genome-wide meQTL mapping in cattle blood reveals cis and trans regulation of DNA methylation

Fouere, C.; Costes, V.; Besnard, F.; Le Danvic, C.; Patry, C.; Fritz, S.; Boussaha, M.; Jouin, M.; Boichard, D.; Kiefer, H.; Costa Monteiro Moreira, G.; Sanchez, M.-P.

2026-07-08 genetics 10.64898/2026.07.07.736355 medRxiv

Top 0.1%

3.1%

Show abstract

Background Complex traits are influenced by numerous variants, most of which have regulatory effects on gene expression that can be mediated by DNA methylation. Molecular QTL mapping is an approach that aims to dissect these effects. However, obtaining molecular phenotypes on a large scale is challenging, particularly in livestock species. In cattle, an epigenotyping array called EpiChip has recently been developed in the European RUMIGEN project. The EpiChip, which contains 43,317 CpG sites distributed all over the bovine genome, enables large-scale measurement of DNA methylation. This study aims to characterize the genetic determinism of blood DNA methylation in cows by estimating heritability and mapping cis- and trans-methylation QTLs (meQTLs). Results Whole blood samples from 4,457 genotyped Holstein cows were epigenotyped. Across all CpG sites, the heritability estimates averaged 24.6%. The local meQTL mapping at sequence-level for variable CpG sites (SD > 2.5%; n = 28,806) detected cis-meQTLs for 80.1% of the CpG sites, with sentinel SNPs located close to their associated CpGs. A two-step analysis was also conducted to identify long-range associations, with a particular focus on trans-meQTL hotspots. First, we identified CpG-SNP trans-associations using medium-density genotypes (50k SNPs) that revealed 31,846 SNPs with significant effects on 1 to 530 trans-CpG sites. Then, regions associated with at least 34 independent trans-CpGs were retained defining 31 hotpots. For each hotspot, a local sequence-level GWAS was conducted using the first principal component derived from the associated trans-CpGs. Out of the 31 detected hotspots, three were located close to transcription factor genes (RUNX1, NFIC and FOXA3) for which the associated trans-CpGs were enriched for the corresponding binding motif. Two other hotspots were located within KDM5A and KDM5B, and their corresponding trans-CpGs were strongly overrepresented in H3K4me3 narrow peaks in blood as well as in other tissues. Conclusions By identifying functional candidate genes associated with blood DNA methylation in cattle, these findings provide new insights into the regulatory architecture of DNA methylation in mammals, highlighting the value of large-scale molecular data from livestock populations.

11

QTL spanning the TGF-β2 locus is associated with muscle fiber hypertrophy in rainbow trout

Raghu, A.; Raymo, G.; Ahmed, R.; Ali, A. R.; Leeds, T.; Salem, M.

2026-05-27 genomics 10.64898/2026.05.24.727516 medRxiv

Top 0.1%

2.4%

Show abstract

BackgroundSkeletal muscle growth is a key determinant of body size and market value in salmonid aquaculture, yet the mechanisms linking genomic variation to muscle fiber hypertrophy remain poorly resolved. Myofiber cross-sectional area (CSA) provides a quantitative cellular proxy for fiber size and a direct link to macroscopic growth traits. MethodsWe performed histological phenotyping of white skeletal muscle from rainbow trout (Oncorhynchus mykiss) representing divergent fillet-yield selection lines (ARS-FY-H and ARS-FY-L), quantifying mean myofiber CSA and fiber number using high-throughput image analysis. Genome-wide association analysis (GWAS) was conducted using low-pass whole-genome sequencing ([~]1x) with genotype imputation and functional variant annotation. RNA sequencing was performed on fish representing high and low CSA extremes to identify differentially expressed genes and enriched biological pathways. ResultsMean myofiber CSA was significantly associated with body weight, muscle weight, visceral weight, and body length (p < 0.05), while fiber count showed no significant association with most growth traits, implicating hypertrophy as the primary driver of muscle mass variation. GWAS identified a significant QTL spanning [~]4.76 Mb on chromosome 2 (117 significant SNPs; Bonferroni-adjusted P [≤] 0.05; {lambda} = 1.02). Associated variants were predominantly noncoding, enriched in intronic, intergenic, and enhancer-annotated regions. A high density of SNPs colocalized with the TGF-{beta}2 locus, overlapping strong and genic enhancer elements in white muscle. Transcriptomic comparisons revealed that high-CSA muscle showed elevated expression of genes related to contractile function, cytoskeletal organization, and translation, while low-CSA muscle exhibited upregulation of extracellular matrix and immune-related genes consistent with a tissue remodeling state. ConclusionsNoncoding regulatory variation within a significant QTL spanning the TGF-{beta}2 locus is associated with distinct transcriptional programs linked to muscle fiber hypertrophy in rainbow trout. By integrating genetic variation, chromatin-state annotation, and transcriptomic profiling, this study identifies candidate regulatory loci associated with variation in muscle cellularity and growth-related phenotypes in rainbow trout.

12

A telomere-to-telomere (T2T) pig genome assembly reveals Y chromosome diversity and structural variations of Wuzhishan pigs

Ren, Y.; Wang, F.; Li, X.; Liu, G.; Sun, R.; Zheng, X.; Zhang, Y.; Lin, R.; Lu, X.; Chen, L.; Xin, W.; Fei, Y.; Chao, Z.

2026-04-27 genomics 10.64898/2026.04.23.720499 medRxiv

Top 0.1%

2.2%

Show abstract

BackgroudWuzhishan (WZS) pigs are native to Hainan Province of China, and serve as both important agricultural resources and biomedical models. Although the published WZS pig genome (T2T-pig1.0) even achieving telomere-to telomere (T2T) completeness, substantial genetic diversity still exists within the same pig breed, another WZS pig genome named WZS-T2T was assembled in this study. ResultsMultiple sequencing data were used to assemble genome, and finally yielded a [~]2.68 Gb telomere-to-telomere genome, with N50 length [~]142.87 Mb, and annotated protein coding genes of 23,100. Compared to T2T-pig1.0, QV and BUSCO value was higher, and the Y chromosome (ChrY) length was longer in WZS-T2T than that of T2T-pig1.0. ChrY of two WZS pigs shared 11 genes, including sex differentiation-related genes of SHOX, PRKX, and DDX3X, and SRY; however, energy metabolism gene SLC25A4 and the macrophage-related receptor gene CSF2RA of ChrY were specific to WZS-T2T. An inversion SV on chromosome 10 with length [~]33.86 Mb was identified between two WZS pigs, and three proofs were proposed for proving the accuracy sequence orientation of WZS-T2T.The genetic diversity was consistent with LD decay speed in population different analysis. WZS pigs exhibited higher genetic diversity than other four pig populations (Tunchang pigs, Yuxi black pigs, Large White pig, and Duroc pigs) examined in this study, and presented slower LD decay compared to other four breeds. ConclusionsTherefore, WZS-T2T provided a higher-quality assembly, and potential advantages of both agricultural production and biomedical targets for WZS pigs.

13

Nitrogen use efficiency in pigs is associated with transcriptomic signatures related to amino acid metabolism, immune activity, and nutrient partitioning

Monney, B.; Ewaoluwagbemiga, E. O.; Kasper, C.

2026-07-01 genomics 10.64898/2026.06.26.733976 medRxiv

Top 0.1%

1.9%

Show abstract

Dietary protein restriction challenges the allocation of amino acids to growth and other physiological functions and therefore requires coordinated metabolic adaptation. Domestic pigs provide an informative system in which to study such responses, because nitrogen retention directly affects lean growth and can be quantified accurately under controlled feeding and housing conditions. Under reduced-protein diets, pigs differ in how effectively they retain nitrogen, and this variation has a genetic basis, making them well suited to investigate the molecular regulation of nitrogen use efficiency (NUE). Here, we characterise differential gene expression and enriched pathways in liver and skeletal muscle of more than 80 pigs with two divergent NUE phenotypes (high and low) maintained under the same protein-reduced, ad libitum dietary conditions. The two NUE phenotypes were clearly distinct at the transcriptomic level, with 177 differentially expressed genes in the liver and 133 in the muscle. In the liver, differential expression and enrichment analyses indicate reduced amino acid catabolism, lower inflammatory and detoxification activity, and a metabolic state that favours lipid processing and insulin-related regulation over the use of amino acids as energy sources. In skeletal muscle, they point to reduced lipid uptake, lower reliance on amino acid oxidation, and a greater emphasis on protein synthesis, translational regulation, mitochondrial energy metabolism, and growth-related processes. These gene-level patterns were supported and extended by pathway and gene-set enrichment analyses. Together, the results suggest that high and low-NUE pigs differ through coordinated, tissue-specific molecular adaptations. Overall, variation in NUE appears to reflect coordinated, tissue-specific differences in how nutrients are allocated between energy use, storage, and lean tissue growth.

14

kinference: Pairwise kinship detection for Close-Kin Mark-Recapture

Bravington, M. V.; Baylis, S. M.; Eveson, P.; Feutry, P.

2026-05-21 genetics 10.64898/2026.05.18.725841 medRxiv

Top 0.1%

1.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWClose-Kin Mark-Recapture (CKMR) is a statistical framework for estimating demographic parameters of wild populations. Instead of recapturing individuals, it relies on the identification of closely-related pairs such as parents and offspring, or siblings. By measuring how often such close-kin are "recaptured" among sampled animals (whether alive or dead), scientists can estimate demographic parameters such as census size, mortality rates, and connectivity. CKMR is starting to change fisheries and wildlife management by giving more reliable demographic information, even for many species that resist conventional approaches. Here we introduce the kinference R package, which provides a set of tools for finding close-kin pairs among thousands of samples each genotyped at thousands of SNPs, and for associated quality control. The CKMR context implies different requirements and assumptions to many other kinship programs. In particular, kinference accounts empirically for linkage without requiring a genome assembly, is able to estimate and control false-negative and false-positive probabilities, and can cope with null alleles. The package has been developed and used in numerous CKMR projects since 2017. This paper documents the assumptions, statistical algorithms, and intended workflow for kinference.

15

Variation in AMY2B Copy Number and Serum Amylase Activity in Wolves (Canis Lupus), Brown Bears (Ursus arctos), and Red Foxes (Vulpes vulpes) from Bosnia and Herzegovina

Katica, J.; Crnkic, C.; Kavazovic, A.; Tahirovic, D.; Pojskic, N.; Skapur, V.; Koro - Spahic, A.; Varatanovic, M.; Goletic, T.

2026-07-14 genetics 10.64898/2026.07.09.737415 medRxiv

Top 0.1%

1.5%

Show abstract

The AMY2B gene encodes pancreatic amylase, a critical enzyme for starch digestion. While previous studies have examined AMY2B copy number variation (CNV) in domestic and some wild animals, less is known about wild carnivores inhabiting regions with limited anthropogenic starch exposure. We analyzed blood samples for serum amylase activity and copy number variation in AMY2B gene from 8 wolves (Canis lupus), 11 brown bears (Ursus arctos), and 3 red foxes (Vulpes vulpes) from Bosnia and Herzegovina. AMY2B gene copy number was assessed using droplet digital PCR (ddPCR), and serum amylase activity and glucose levels were quantified. Although the number of fox samples was limited, foxes and wolves consistently harbored two copies of AMY2B, while brown bears exhibited higher CNV (3.67-8.40, mean 5.88). Serum amylase activity was highest in foxes, moderate in wolves, and variable but lower in bears. Despite differences in AMY2B copy number and serum amylase activity, circulating glucose concentrations did not differ significantly among species. Our findings suggest that variation in AMY2B copy number among wild carnivores may be associated with species-specific evolutionary histories and dietary adaptations, providing insight into genomic mechanisms underlying carbohydrate utilization in natural populations.

16

Two-tower models for genomic prediction of reproductive outcomes and sex-specific fertility liabilities: simulation insights

Pappas, F.; Palaiokostas, C.; Debes, P. V.; Johnsson, M.

2026-07-09 genetics 10.64898/2026.07.03.736358 medRxiv

Top 0.1%

1.4%

Show abstract

Many biological characteristics arise by interactions between more than one biological organism or unit. Fertilization success in sexually reproducing species represents such an extended phenotype where both mates are required to be fertile for a successful outcome. Consequently, predictive models should account for the joint nature of reproductive performance while offering interpretable estimates for individual mate contributions. Recent advances in genomics and machine learning (ML) provide standardized, high-dimensional genetic information on one hand and computational tools capable of modeling complex biological systems on the other. Here, we construct and evaluate two-tower (TT) machine learning architectures for genomic prediction of binary reproductive outcomes and recovery of sex-specific fertility liabilities. Simulated datasets, generated under a range of genetic architectures, were utilized to compare multilayer perceptron (TT-MLP), convolutional neural network (TT-CNN), and L1-regularized linear (TT-LASSO) two-tower models. Simulation scenarios varied sex-specific heritabilities, genetic correlations, infertility prevalence, mating structure, and sex-specific infertility rates. Models were evaluated with regard to their ability to predict reproductive success at pair level and also recover true underlying genetic values for male and female fertility. Prediction accuracy increased with the underlying heritable component as expected, while sex-specific tower-scores successfully recovered latent fertility liabilities despite models being trained only on observed joint outcomes. TT-LASSO achieved the highest overall classification performance, whereas TT-MLP provided more balanced and consistent recovery of sex-specific genetic values across scenarios. An additional simulation, incorporating genotype-dependent mate compatibility demonstrated advantages of fully-connected neural networks for capturing non-additive interactions. These results indicate that two-tower frameworks provide a powerful approach for modeling reproductive traits, enabling simultaneous prediction of aggregate reproductive outcomes and sex-specific fertility liabilities from genotypic information.

17

Enhancing predictive accuracy of yield traits in cassava through multi-trait genomic prediction

de Freitas, G. M.; Certuche, D. S.; Jannink, J.-L.; de Oliveira, E. J.; Garcia, A. A. F.

2026-07-06 genetics 10.64898/2026.07.01.735838 medRxiv

Top 0.2%

1.1%

Show abstract

Multi-trait genomic prediction offers a practical route to improve selection for costly, complex traits in clonally propagated crops such as cassava. In a Brazilian breeding panel of 1,078 cassava clones genotyped with 25,923 SNPs and phenotyped for six agronomic traits, we compared single-trait (ST) and multi-trait (MT) GBLUP models. Stage-wise mixed models produced BLUEs that fed into ST and MT-GBLUP. We tested five cross-validation schemes that mimic breeder realities: ST baseline (CV1); naive all-traits MT prediction for unphenotyped candidates (CV2); MT prediction using auxiliary trait phenotypes in the test set (CV3); and two sparse-phenotyping regimes with missingness by trait (CV4) or by clone (CV5) at 25%, 50%, and 75% levels. The main results were that, under the ST baseline (CV1), predictive ability ranged from 0.50 for DMC and 0.45 for FRY down to 0.13 for Le.Dis. A naive full MT model (CV2) performed approximately on par with ST-GBLUP. In contrast, MT designs (CV3) that included informative auxiliary traits, such as shoot yield and combinations with plant vigor and leaf disease severity, yielded small gains for DMC with predictive ability of approximately 0.51 (+2%), while FRY predictive ability increased to approximately 0.65 (+44%), accompanied by RMSE reductions for FRY up to approximately 13.5% (e.g. RMSE approximately 6.2). Sparse-phenotyping simulations (CV4/CV5) demonstrated that MT models sustain or even improve predictive ability under realistic missing-data regimes (PA {approx} 0.62 - 0.65). Selection concordance between MT and ST top-10% sets was generally high (>0.80), and MT configurations produced measurable improvements in expected selection response and genetic gain per cycle for several target traits. These results indicate that strategically implemented MT-GBLUP, using a small set of biologically and operationally informative auxiliary traits and optimized sparse phenotyping, can materially increase predictive accuracy and selection efciency for economically critical cassava traits while reducing phenotyping burden.

18

A gapless Landrace pig genome resolves centromeres and telomeres and highlights telomere repeat structures in different pig breeds

Grove, H.; Stenlokk, K. S. R.; Lien, S.; Gjuvsland, A. B.; Arnyasi, M.; van Son, M.; Kent, M.

2026-06-30 genomics 10.64898/2026.06.25.734473 medRxiv

Top 0.2%

1.0%

Show abstract

Abstract The Duroc-derived reference genome Sscrofa11.1 has provided a critical foundation for pig genomics, providing a high-quality reference genome for accurate variant detection and comparative genomics but does not capture breed-specific variation. Here, we present a near-complete, gap-free genome assembly for the Landrace pig (Landrace_v1, GCA_963921485.1), spanning all 20 chromosomes and totaling 2.6 Gb, including 176 Mb of sequence absent from Sscrofa11.1. Comparative analyses with recently published high-quality pig genomes reveal a conserved centromere organization across breeds, accompanied by substantial variation in repeat composition and length, and identify a pig specific pattern of telomere variant repeats across eight pig breeds. The improved resolution of repetitive regions in Landrace_v1 enables more complete reconstruction of complex gene families, including olfactory receptors, and uncovers structural variation at the KIT proto-oncogene receptor tyrosine kinase locus not represented in the Duroc reference. Together, these findings highlight the limitations of single-reference genomes and demonstrate the value of breed-specific assemblies for capturing genomic diversity and improving downstream analyses.

19

Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Bienvenu, C.; Roger, J.-M.; Sene, M.; Castro Pacheco, S. A.; Singer, M.; Felaniaina, B. L.; Terrier, N.; De Bellis, F.; Pot, D.; DE VERDAL, H.; Segura, V.

2026-05-13 genetics 10.64898/2026.05.10.724118 medRxiv

Top 0.2%

1.0%

Show abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability. Plain language summaryCultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction. Core IdeasO_LIWorking on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing C_LIO_LIWorking on principal components of spectra requires less optimization of parameters than chemometrics pre-processing C_LIO_LIAbout 1% of spectral variance is responsible for most of the predictive power of phenomic prediction C_LIO_LIWorking on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more C_LIO_LIPCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics field C_LI

20

Multispecies Mixtures: An Individual-Centered Quantitative Genetic Framework for Complex Plant Neighborhoods

Salas, N.; Montazeaud, G.; Bourke, P. M.; Baranger, A.; David, J.

2026-05-29 genetics 10.64898/2026.05.27.728303 medRxiv

Top 0.2%

1.0%

Show abstract

Modern agriculture faces major sustainability challenges, including stagnating yields, dependence on fossil resources, and severe environmental impacts. Increasing intra- and interspecific diversity within plots through agroecological design is a promising method for enhancing crop productivity and stability. However, mixed-crop performance remains highly variable, and the genetic architecture of interactions within heterogeneous canopies is poorly understood. Two quantitative genetic frameworks have been proposed: trait-based models, which describe how interacting traits shape phenotypes, and variance-based models, which treat neighbor genotype effects as "black-box" social effects. However, existing variance-based models have been developed almost exclusively for intraspecific interactions and simple neighborhoods. We propose a general multispecies framework describing how a focal plants phenotype and total breeding value arise from its own direct effects and from the indirect effects of conspecific and heterospecific neighbors. We derived analytical expressions for phenotypic variance, inter-individual covariance, total breeding value variance, and relative heritable variance, which explicitly account for spatial structure, relatedness, and environmental similarities. Using a two-species alternating-row field layout and extensive simulations based on flexible variance-covariance structures, we evaluated the statistical power and bias of joint mixed-model estimators of direct and indirect genetic and environmental effects under a wide range of parameter combinations. Our results show that accurate separation of direct and indirect effects depends on trait heritability and replication, and that modeling genetic covariances across effects and species substantially improves estimation accuracy. This framework provides a unified, individual-centered basis for analyzing complex multispecies neighborhoods and quantifying the breeding potential of plant communities. Article SummaryGrowing several crop species or varieties together in the same field can boost yield and stability, but the outcome is unpredictable and the genetic causes remain unclear. We developed a theoritical & statistical framework that links each plants performance to its own genes and to those of its neighbors, both from the same and from a different species. Computer simulations of a two-species field showed that these direct and neighbor-driven genetic effects can be reliably separated when enough plants are measured per variety. The framework opens the way to breeding crop mixtures that perform well specifically when grown alongside another species.